Follow-ups: object-literal reconstruction pass + accurate brackets metric#63
Merged
Conversation
The synthetic javascript-obfuscator (obfuscator.io) fixtures in samples/generated/ gated correctness via manifest.json but were excluded from the readability metrics: both the live report binary and the committed SCOREBOARD.md read samples/ non-recursively. Add a per-profile rollup of the generated corpus (aggregated over all seeds, one row per obfuscation technique) to both surfaces, so the obfuscator.io samples count toward readability the same way they count toward correctness. kept% is byte-weighted, opaque% is the mean per-file ratio, rounds is the worst case, and converged flags any non-fixpoint.
rename: RenameByRole now infers meaningful names for array-iteration callback params (reduce->acc/value, map/filter/forEach->item/index, sort->left/right), C-style loop counters (->index), and catch bindings (->error), instead of falling back to generic varN. Names stay >=3 chars so they remain idempotent under the opaque-name guard, and reuse the existing scope de-duplicator. report/golden: add a hexrefs column (raw, non-distinct count of _0x... identifier occurrences) to the live dashboard and committed scoreboard. opaque% counts DISTINCT tokens, so a single surviving decoder referenced N times barely moved it; hexrefs spikes when a string-array decoder is left intact, so the board now flags the worst failures (strarr_base64 163, strarr_rc4 211, numbers_keys 231, strong 385) instead of greenlighting them. Snapshots/scoreboard re-blessed.
…ional-chain members
dce: a string-array decoder's accessor memoizes through its own name
(if (f.flag===undefined){ f.cache={}; ... } ... f.cache[k] ...). After
every call site is inlined by decoder-lift, the only surviving references
to f are reads of its own properties inside its own body, which pinned
the spent decoder and its entire encoded string array alive forever.
fn_decl_is_dead now treats a function as dead when every resolved read of
its symbol is lexically inside its own body (shadowing-safe via reference
resolution), with the existing guard that a still-called self-reassigning
function is kept. Collapses the obfuscator.io string-array profiles:
strarr_rc4 kept 72%->19% (hexrefs 211->3), strarr_base64 68%->28%
(163->3); corpus output 328K->154K bytes, hexrefs 998->217.
member-normalize: a?.["foo"] parses as a ChainElement, not an
Expression, so optional-chained computed members were never normalized.
Added enter_chain_element to rewrite them to a?.foo (identifier keys
only, optional flag preserved). Covered by a new phase1 test.
Snapshots/scoreboard re-blessed; full equivalence/determinism/corpus net green.
…-zft90c # Conflicts: # src/bin/report.rs # tests/golden.rs # tests/snapshots/SCOREBOARD.md
ReconstructObject (new pass): transformObjectKeys and hand packers lower
an object literal into an empty object plus a contiguous run of property
writes (var O = {}; O.a = …; O.b = …). This pass folds that run back into
the literal it came from. Beyond readability, it is the keystone for the
operator-proxy *tables* obfuscators build the same way: reconstructing
var t = {}; t.m = function(){…} into the { m: function(){…} } literal is
what lets proxy-inline recognize and collapse them, which folds the
opaque predicates guarding dead branches so DCE removes them. Sound by
construction: only an empty-object seed, only immediately-following
contiguous X.<staticKey> = <expr> writes, value may not reference X (it
is not yet bound in the literal), __proto__ and duplicate keys stop the
run. Runs before ProxyInline. Cuts corpus decoder residue 217 -> 149
hexrefs and shrinks the split-object profiles further.
brackets metric: counted raw [" occurrences, which are dominated by
array/object literals (= ["x"]), not member access — sample_10 read 96
when only ~6 are real accesses. Now counts [" only where the byte before
[ ends an expression (identifier char, ), ], or quote), i.e. genuine
string-keyed member access. Mirrored in report.rs and golden.rs.
New phase1 tests cover the fold and the proxy-table cascade. Snapshots
and scoreboard re-blessed; full equivalence/determinism/corpus net green.
…-zft90c # Conflicts: # tests/snapshots/SCOREBOARD.md # tests/snapshots/sample_11.js.out.js # tests/snapshots/sample_7.js.out.js
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Follows up the open items from the deobfuscation-improvement effort (PR #62). Two concrete, general, sound changes; the larger items are deliberately deferred (see below).
1.
ReconstructObject— new passtransformObjectKeys(and hand packers) lower an object literal into an empty object plus a contiguous run of property writes:This folds that run back into the literal. Beyond readability, it's the keystone for proxy-table recovery: obfuscators build their operator-proxy tables the same way (
const t = {}; t.m = function(a,b){…}), so reconstructing them into the{ m: function… }literal is what letsproxy_inlinerecognize and collapse them — which folds the opaque predicates guarding dead branches so DCE removes them.Sound by construction: only an empty-object seed; only immediately-following contiguous
X.<staticKey> = <expr>writes; the value may not referenceX(it isn't bound yet inside the literal);__proto__and duplicate keys stop the run; property order is preserved so side-effect order is unchanged. Runs beforeProxyInline.Impact: corpus decoder residue 217 → 149 hexrefs; the split-object profiles shrink further (
numbers_keys71→37,strong132→98). All 140 generated + 15 real samples still pass equivalence.2. Accurate
bracketsmetricThe metric counted raw
["occurrences, dominated by array/object literals (= ["x"]), not member access — sample_10 read 96 when only ~6 are real accesses. It now counts["only where the byte before[ends an expression (identifier char,),], or a closing quote), i.e. genuine string-keyed member access. Mirrored inreport.rsandgolden.rs. sample_10 96→6; sample_7 576→528 (its residuals are genuine decoder-gated base64-key accesses like)["BX52O0AVdwg="], correctly retained).Deliberately deferred (each its own effort)
strong/numbers_keys: remaining_0xare mostly top-level object names thatRenameByRoleintentionally leaves (root-scope bindings), plus a few proxy cases needing predicate folding.v1…v999renaming: a size/readability tradeoff, not a clear win.Verification
New phase1 tests cover the fold and the proxy-table cascade. Full slow net green: golden (re-blessed),
sample_equivalence(behavior preserved on every sample),generated_corpus(all 140 reproduce manifest output),determinism(5).cargo clippy --all-targetsclean.https://claude.ai/code/session_01EjhNTCU89wa5zaeRHMnfEc
Generated by Claude Code